-
Notifications
You must be signed in to change notification settings - Fork 188
[ENH] BEP036 - Phenotypic Data Guidelines #2123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Upstream PR
Quick update before merging our PR on surchs fork
BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification. - Includes an appendix called `phenotype.md` - Includes admonitions for the guidelines in-line with modality agnostic files sections --------- Co-authored-by: Eric Earl <[email protected]> Co-authored-by: Samuel Guay <[email protected]> Co-authored-by: Sebastian Urchs <[email protected]> Co-authored-by: Arshitha B <[email protected]>
Changed "e.g." to "for example" to follow contributing style guidelines.
for more information, see https://pre-commit.ci
src/appendices/phenotype.md
Outdated
each `phenotype/<measurement_tool_name>.json` data dictionary. | ||
This improves reusability and provides clarity about the measurement tool. | ||
|
||
### 5. Use the demographics file for common variables about participants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying from https://github.com/surchs/bids-specification/pull/1/files#r2103117486
For this section, would it make sense to suggest that demo-like information be prioritized in this file rather than participants.tsv
, making the latter primarily a list of subject IDs? I haven't seen this explicitly addressed anywhere, though I'm unsure if it's something we want to formalize 😬
Something like this could follow the paragraph?:
When all demographic data is stored in
phenotype/demographics.tsv
,participants.tsv
may serve primarily as a minimal listing of subject identifiers with only theparticipant_id
column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It'd be good to mention this.
Put the phenotypic and assessment data content where it belongs.
|
||
In case of multiple sessions there is an option of adding additional | ||
`sessions.tsv` files describing variables changing between sessions. | ||
`sessions.tsv` files describing each session and variables changing between sessions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to an example dataset? Note to self.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Arshitha if you decided against providing an example, please resolve this.
} | ||
} | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link an example dataset for this case? Note to self.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Arshitha Perhaps something like: "see (some specific) example in [the phenotype appendix](link)
"?
1. To encode the acquisition time for a measurement tool’s `session_id`, | ||
add the `session_id` to the sessions file and | ||
include the OPTIONAL `acq_time` column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be rephrased for better clarity. Will add a suggestion shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awaiting suggestion. 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericearl Sebastian (@surchs ) beat me to it with this commit suggestion #2123 (comment)
… related files Attempt to address GitHub comments from Arshitha and apply correct filetree macros.
Trying to make the dev validator happier with the the participant_id column optionally first in sessions files.
Missed a column for the Sessions file: run_id.
Missed the session_id column being 2nd for Phenotype.
- Added in a new guideline 7 to encourage the use of participants and sessions files for different uses. - Re-numbered old guidelines 7-9 to 8-10.
Removing excess line I forgot to remove earlier. Thanks remark CI!
.vscode/settings.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is added by accident in surchs@0eba71d @ericearl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I will try to get it out of there.
src/appendices/phenotype.md
Outdated
1. Each row MUST start with `participant_id`. | ||
|
||
1. Each TSV file MUST contain a `session_id` column when | ||
multiple [sessions](../glossary.md#session-entities)[^1] are present | ||
in the data set regardless of whether those sessions are in | ||
the `phenotype/` data, `sub-<label>/` data, or a combination of the two. | ||
|
||
1. If more than one of the same measurement tool is acquired within | ||
the same `session_id`, a `run_id` column MUST be added. | ||
|
||
1. To encode the acquisition time for a measurement tool’s `session_id`, | ||
add the `session_id` to the sessions file and | ||
include the OPTIONAL `acq_time` column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Each row MUST start with `participant_id`. | |
1. Each TSV file MUST contain a `session_id` column when | |
multiple [sessions](../glossary.md#session-entities)[^1] are present | |
in the data set regardless of whether those sessions are in | |
the `phenotype/` data, `sub-<label>/` data, or a combination of the two. | |
1. If more than one of the same measurement tool is acquired within | |
the same `session_id`, a `run_id` column MUST be added. | |
1. To encode the acquisition time for a measurement tool’s `session_id`, | |
add the `session_id` to the sessions file and | |
include the OPTIONAL `acq_time` column. | |
a. Each row MUST start with `participant_id`. | |
b. Each TSV file MUST contain a `session_id` column when | |
multiple [sessions](../glossary.md#session-entities)[^1] are present | |
in the data set regardless of whether those sessions are in | |
the `phenotype/` data, `sub-<label>/` data, or a combination of the two. | |
c. If more than one of the same measurement tool is acquired within | |
the same `session_id`, a `run_id` column MUST be added. | |
d. To encode the acquisition time for a measurement tool’s `session_id`, | |
add the `session_id` to the sessions file and | |
include the OPTIONAL `acq_time` column. |
Or just regular list? I wouldn't nest one top-level enumeration in another.
Optional: Yes | ||
|
||
An aggregated sessions file CAN be provided at the dataset root. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really optional though, right? Or rather, it's only optional in the sense that you could chose the other option and make subject-level sessions.tsv
files.
As mentioned above: I would take a stance here and make one of the two options the recommended one. To me that would be the root-level
file. And then we can say a word on why we recommend the option.
## Sessions file | ||
|
||
Template: | ||
### Option 1: Segregated sessions files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't comment on the main heading for Sessions file:
There is a lot of commonality b/w the root-level and subject-level sessions.tsv
. i.e. everything about what kind of info should go in them and what they are for. So how about we pull that info up under the heading. And then only explain the differences in the two options sections
`sessions.json` example: | ||
|
||
```JSON | ||
{ | ||
"participant_id": { | ||
"Description": "Participant identifier" | ||
}, | ||
"session_id": { | ||
"Description": "Session identifier for the session", | ||
"Levels": { | ||
"ses-predrug": "session before drug administration", | ||
"ses-postdrug": "session after drug administration", | ||
"ses-followup": "follow-up session" | ||
} | ||
}, | ||
"acq_time": { | ||
"Description": "Acquisition time of the session" | ||
}, | ||
"systolic_blood_pressure": { | ||
"Description": "Systolic blood pressure measured at the beginning of the session in mmHg" | ||
} | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned above: this should go under the Sessions File heading - right now the example table of which columns to put in a sessions.tsv
is listed under the "segregated" option, but the data dictionary under the "aggregated" option
Accidental file.
Added in easily-agreeable suggestions in a batch. Co-authored-by: Sebastian Urchs <[email protected]>
for more information, see https://pre-commit.ci
Attempt to address more of @surchs comments.
Thanks for catching that excess newline, remark!
Remove acq_time as a phenotype column recommendation/option, as it should go into the sessions file instead.
Remove acq_time__phenotype from columns.yaml since it was removed from the rest of the schema.
Accept Sebastian's suggestion about the phrasing of guideline 8. Co-authored-by: Sebastian Urchs <[email protected]>
for more information, see https://pre-commit.ci
Changing "subject-level" to "participant-level" in sessions files section.
To better differentiate demographic data from phenotypic data
The BEP leads can meet as-needed to discuss this BEP PR
Coordinate a meeting by emailing Eric Earl: [email protected].
Communicate on this PR to provide feedback otherwise.
HTML preview of this BEP
BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification.
phenotype.md
Additional Links
Co-authored-by: Eric Earl [email protected] @ericearl
Co-authored-by: Samuel Guay [email protected] @SamGuay
Co-authored-by: Sebastian Urchs [email protected] @surchs
Co-authored-by: Arshitha Basavaraj [email protected] @Arshitha